We have used India Districts Factsheets of National Family Health Survey (NFHS) - 5, 2019-2021 data. This data provides 137 features about various sectors like education, crime, health, finance etc. of all the states and union territories of India, segrated into Rural and Urban data. From these factors we have chosen to work upon factors related to Women to know about the recent status of women in Indian society.
import pandas as pd
import plotly.express as px
import ipywidgets as widgets
import numpy as np
from ipywidgets import interact,interactive,interact_manual, fixed
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
%%HTML
<script src="require.js"></script>
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
df = pd.read_csv("data/NFHS_5_Factsheets_Data.csv")
df.dtypes
States/UTs object
Area object
Number of Households surveyed int64
Number of Women age 15-49 years interviewed int64
Number of Men age 15-54 years interviewed int64
...
Women age 15 years and above who use any kind of tobacco (%) object
Men age 15 years and above who use any kind of tobacco (%) object
Women age 15 years and above who consume alcohol (%) object
Men age 15 years and above who consume alcohol (%) object
Unnamed: 136 float64
Length: 137, dtype: object
df.shape
(110, 137)
for i in range(df.shape[0]):
for j in range(df.shape[1]):
if df.iloc[i, j] == '*':
df.iloc[i, j] = '0'
columns = df.columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 110 entries, 0 to 109 Columns: 137 entries, States/UTs to Unnamed: 136 dtypes: float64(7), int64(3), object(127) memory usage: 117.9+ KB
df.describe()
| Number of Households surveyed | Number of Women age 15-49 years interviewed | Number of Men age 15-54 years interviewed | Population below age 15 years (%) | Sex ratio of the total population (females per 1,000 males) | Population living in households with electricity (%) | Population living in households with an improved drinking-water source1 (%) | Population living in households that use an improved sanitation facility2 (%) | Adolescent fertility rate for women age 15-19 years5 | Unnamed: 136 | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 110.000000 | 0.0 |
| mean | 22987.354545 | 26136.836364 | 3675.772727 | 24.677273 | 1008.291818 | 98.283636 | 94.687273 | 77.125455 | 32.330000 | NaN |
| std | 76080.721354 | 86810.853826 | 12151.679425 | 4.113630 | 71.953734 | 2.261587 | 5.641574 | 12.762743 | 21.120691 | NaN |
| min | 21.000000 | 26.000000 | 0.000000 | 18.100000 | 775.000000 | 88.900000 | 68.900000 | 34.800000 | 0.000000 | NaN |
| 25% | 2844.000000 | 2797.750000 | 428.500000 | 22.000000 | 964.250000 | 97.700000 | 92.900000 | 69.250000 | 19.000000 | NaN |
| 50% | 8017.500000 | 9047.500000 | 1289.000000 | 23.950000 | 1014.500000 | 99.300000 | 96.350000 | 79.350000 | 27.000000 | NaN |
| 75% | 18758.500000 | 21701.500000 | 3177.750000 | 26.375000 | 1049.750000 | 99.600000 | 98.600000 | 85.975000 | 43.000000 | NaN |
| max | 636699.000000 | 724115.000000 | 101839.000000 | 39.200000 | 1193.000000 | 100.000000 | 100.000000 | 100.000000 | 102.000000 | NaN |
To analyze women's education in different states we have picked up for four contributing factors, they are, Women (age 15-49) who are literate, Female population age 6 years and above who ever attended school, Women (age 15-49) with 10 or more years of schooling and Women (age 15-49) who have ever used the internet. The last factor was chosen as even if a women didn't attend formal schooling but still she can educate herself through the internet.
Spider chart visualization is done along with drop down menu to visualize the data from different states.
df['Female population age 6 years and above who ever attended school (%)'].dtype
dtype('O')
# Checking for badly formatted values in dataset.
for i in range(df['Female population age 6 years and above who ever attended school (%)'].shape[0]):
cur = df['Female population age 6 years and above who ever attended school (%)'].iloc[i]
try:
float(cur)
except:
print(cur, 'at i =', i)
(69.2) at i = 19
df.columns.get_loc('Female population age 6 years and above who ever attended school (%)')
5
# Manually correcting badly formatted values.
df.iloc[19,5]="69.2"
# Converting object into numeric values.
df['Female population age 6 years and above who ever attended school (%)'] = pd.to_numeric(df['Female population age 6 years and above who ever attended school (%)'])
# Filtering out total data not considering Rural and Urban.
df2 = df[df['Area']== 'Total']
df2.head()
| States/UTs | Area | Number of Households surveyed | Number of Women age 15-49 years interviewed | Number of Men age 15-54 years interviewed | Female population age 6 years and above who ever attended school (%) | Population below age 15 years (%) | Sex ratio of the total population (females per 1,000 males) | Sex ratio at birth for children born in the last five years (females per 1,000 males) | Children under age 5 years whose birth was registered with the civil authority (%) | ... | Women (age 15-49 years) having a mobile phone that they themselves use (%) | Women age 15-24 years who use hygienic methods of protection during their menstrual period26 (%) | Ever-married women age 18-49 years who have ever experienced spousal violence27 (%) | Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%) | Young women age 18-29 years who experienced sexual violence by age 18 (%) | Women age 15 years and above who use any kind of tobacco (%) | Men age 15 years and above who use any kind of tobacco (%) | Women age 15 years and above who consume alcohol (%) | Men age 15 years and above who consume alcohol (%) | Unnamed: 136 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | India | Total | 636699 | 724115 | 101839 | 71.8 | 26.5 | 1020.0 | 929 | 89.1 | ... | 54.0 | 77.3 | 29.3 | 3.1 | 1.5 | 8.9 | 38.0 | 1.3 | 18.8 | NaN |
| 5 | Andaman & Nicobar Islands | Total | 2624 | 2397 | 367 | 83.5 | 20.8 | 963.0 | 914 | 97.4 | ... | 80.8 | 98.9 | 17.2 | 0.3 | 1.8 | 31.3 | 58.7 | 5.0 | 39.1 | NaN |
| 8 | Andhra Pradesh | Total | 11346 | 10975 | 1558 | 65.6 | 22.2 | 1045.1 | 933.6 | 92.2 | ... | 48.9 | 85.1 | 30.0 | 3.8 | 3.7 | 3.8 | 22.6 | 0.5 | 23.3 | NaN |
| 11 | Arunachal Pradesh | Total | 18268 | 19765 | 2881 | 71.2 | 27.1 | 997.0 | 979 | 87.7 | ... | 76.4 | 91.8 | 24.8 | 3.0 | 0.7 | 18.8 | 50.3 | 24.2 | 52.7 | NaN |
| 14 | Assam | Total | 30119 | 34979 | 4973 | 78.2 | 28.3 | 1012.0 | 964 | 96.3 | ... | 57.2 | 66.3 | 32.0 | 2.3 | 8.0 | 22.1 | 51.8 | 7.3 | 25.1 | NaN |
5 rows × 137 columns
# States/UTs into main indexing.
df2 = df2.set_index(list(df2)[0])
df2.head()
| Area | Number of Households surveyed | Number of Women age 15-49 years interviewed | Number of Men age 15-54 years interviewed | Female population age 6 years and above who ever attended school (%) | Population below age 15 years (%) | Sex ratio of the total population (females per 1,000 males) | Sex ratio at birth for children born in the last five years (females per 1,000 males) | Children under age 5 years whose birth was registered with the civil authority (%) | Deaths in the last 3 years registered with the civil authority (%) | ... | Women (age 15-49 years) having a mobile phone that they themselves use (%) | Women age 15-24 years who use hygienic methods of protection during their menstrual period26 (%) | Ever-married women age 18-49 years who have ever experienced spousal violence27 (%) | Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%) | Young women age 18-29 years who experienced sexual violence by age 18 (%) | Women age 15 years and above who use any kind of tobacco (%) | Men age 15 years and above who use any kind of tobacco (%) | Women age 15 years and above who consume alcohol (%) | Men age 15 years and above who consume alcohol (%) | Unnamed: 136 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| States/UTs | |||||||||||||||||||||
| India | Total | 636699 | 724115 | 101839 | 71.8 | 26.5 | 1020.0 | 929 | 89.1 | 70.8 | ... | 54.0 | 77.3 | 29.3 | 3.1 | 1.5 | 8.9 | 38.0 | 1.3 | 18.8 | NaN |
| Andaman & Nicobar Islands | Total | 2624 | 2397 | 367 | 83.5 | 20.8 | 963.0 | 914 | 97.4 | 90.9 | ... | 80.8 | 98.9 | 17.2 | 0.3 | 1.8 | 31.3 | 58.7 | 5.0 | 39.1 | NaN |
| Andhra Pradesh | Total | 11346 | 10975 | 1558 | 65.6 | 22.2 | 1045.1 | 933.6 | 92.2 | 80.2 | ... | 48.9 | 85.1 | 30.0 | 3.8 | 3.7 | 3.8 | 22.6 | 0.5 | 23.3 | NaN |
| Arunachal Pradesh | Total | 18268 | 19765 | 2881 | 71.2 | 27.1 | 997.0 | 979 | 87.7 | 34.5 | ... | 76.4 | 91.8 | 24.8 | 3.0 | 0.7 | 18.8 | 50.3 | 24.2 | 52.7 | NaN |
| Assam | Total | 30119 | 34979 | 4973 | 78.2 | 28.3 | 1012.0 | 964 | 96.3 | 65.5 | ... | 57.2 | 66.3 | 32.0 | 2.3 | 8.0 | 22.1 | 51.8 | 7.3 | 25.1 | NaN |
5 rows × 136 columns
list_widget = widgets.Dropdown(options=np.unique(df2.index))
def spider_chart(state, labels):
list_widget = widgets.Dropdown(options=np.unique(df2.index))
cur=df2.loc[state]
df1 = pd.DataFrame(dict(
r=[float(cur[x]) for x in labels],
theta=labels))
fig = px.line_polar(df1, r='r', theta='theta', line_close=True, title=state)
fig.update_traces(fill='toself')
fig.show()
def update_state(*args):
state=list_widget.value
list_widget.observe(update_state,'value')
labels=['Female population age 6 years and above who ever attended school (%)','Women (age 15-49) who are literate4 (%)', 'Women (age 15-49) with 10 or more years of schooling (%)', 'Women (age 15-49) who have ever used the internet (%)']
interact(spider_chart, state=list_widget, labels=fixed(labels))
<function __main__.spider_chart(state, labels)>
From the visualization, in the northen states like Jammu & Kashmir, Himachal Pradesh, Punjab the percentage of Women who can use internet facilities is almost constant, lying in range (50-60%). Also in Jammu & Kashmir the level of education is relatively lower as compared to other states, with other states having almost the same level of education.
In the eastern states like Uttar Pradesh and Bihar all the factors are decreasing considerably meaning that importance is not given to Women's education and their independence. In Bihar and Odisha the accesss to internet to women is horrible i.e., around 20% only, another noticable thing in Bihar is that Number of Women who have attended primary education is more as compared to percentage of literate women it means that there are a percentage of women that have gone to school but came out illiterate (4%).
Things get a little better as we go further east then all of the factors are increasing which is especially true for Sikkim i.e., there are 80% of women who have access to internet and 90% of literate women. However, in Tripura and Assam internet access is very low comparable to Bihar and also Women attending higher education is around 20-30% only.
In the center and west side, all the factors are in the same range i.e., around 30% female internet access and 40% provision of higher education in almost all of the states. However, Goa shines in all the factors are relatively high i.e., primary education is 90%, internet access is 70% and also the percentage of women pursuing higher education is much higher, 73%.
As we travel down south all the factors are relatively good as compared to states of north and west but in Kerala all the factors are astoundingly high. However, surprisingly in Telangana internet access is very low i.e., less than 20% but higher education provision is more i.e., around 45%.
df['Female population age 6 years and above who ever attended school (%)'] = df['Female population age 6 years and above who ever attended school (%)'].astype('float64')
# Making a dataset containing Urban and Rural values.
non_total_data = df.loc[df['Area'].isin(['Rural', 'Urban'])]
non_total_data.drop(non_total_data[non_total_data['States/UTs'] == "India"].index, inplace = True)
C:\Users\divtr\Anaconda3\lib\site-packages\pandas\core\frame.py:4913: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
non_total_data['Female population age 6 years and above who ever attended school (%)']
3 86.5
4 81.8
6 75.5
7 61.2
9 83.2
...
103 64.6
105 82.4
106 72.0
108 84.1
109 73.3
Name: Female population age 6 years and above who ever attended school (%), Length: 72, dtype: float64
The following bar plot shows the comparison between provision of facilities in Urban and Rural Area in sorted manner.
sns.catplot(data=non_total_data, x='States/UTs', y='Female population age 6 years and above who ever attended school (%)', hue="Area", kind='bar', height=6, aspect=3, order=non_total_data.sort_values('Female population age 6 years and above who ever attended school (%)')['States/UTs'])
plt.xticks(rotation=90)
plt.show()
for i in range(df['Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)'].shape[0]):
cur = df['Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)'].iloc[i]
try:
float(cur)
except:
print(cur, 'at i =', i)
(11.7) at i = 51 (10.3) at i = 76 (13.1) at i = 90
df.columns.get_loc('Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)')
129
df.iloc[51, 129]="11.7"
df.iloc[76, 129]="10.3"
df.iloc[90, 129]="13.1"
df['Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)'] = pd.to_numeric(df['Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)'])
From the stack plot visualization, the percentage of women who have experienced spousal, physical and sexual violence has been visualized in urban and rural areas.
plot = go.Figure(data=[go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Rural']["Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)"],
stackgroup='one', name="Rural"),
go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Urban']["Ever-married women age 18-49 years who have ever experienced spousal violence27 (%)"],
stackgroup='one', name="Urban"),
])
plot.show()
From the visualization, it can be noticed that spousal violence on women is very high in south and north-eastern states.
Spousal violence is highest in Karnataka in south while in north-east Manipur is the highest in both Urban and Rural areas. Bihar also stands out as number of spousal violence cases specially in rural area is very high.
for i in range(df['Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)'].shape[0]):
cur = df['Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)'].iloc[i]
try:
float(cur)
except:
print(cur, 'at i =', i)
(0.0) at i = 3 (0.0) at i = 51 (0.0) at i = 76 (0.4) at i = 90
df.columns.get_loc('Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)')
130
df.iloc[3, 130]="0.0"
df.iloc[51, 130]="0.0"
df.iloc[76, 130]="0.0"
df.iloc[90, 130]="0.4"
df['Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)'] = pd.to_numeric(df['Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)'])
plot = go.Figure(data=[go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Rural']["Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)"],
stackgroup='one', name="Rural"),
go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Urban']["Ever-married women age 18-49 years who have experienced physical violence during any pregnancy (%)"],
stackgroup='one', name="Urban"),
])
plot.show()
From the visualization, it can be noticed that most of the physical violence case on women come from rural india.
Physical violence is highest in Karnataka in south followed by Bihar, Manipur while the least percentage of women suffered from physical violence during pregnancy is in Ladakh.
for i in range(df['Young women age 18-29 years who experienced sexual violence by age 18 (%)'].shape[0]):
cur = df['Young women age 18-29 years who experienced sexual violence by age 18 (%)'].iloc[i]
try:
float(cur)
except:
print(cur, 'at i =', i)
(6.1) at i = 51 (0.0) at i = 82 (3.2) at i = 90
df.columns.get_loc('Young women age 18-29 years who experienced sexual violence by age 18 (%)')
131
df.iloc[51, 131]="6.1"
df.iloc[82, 131]="0.0"
df.iloc[90, 131]="3.2"
df['Young women age 18-29 years who experienced sexual violence by age 18 (%)'] = pd.to_numeric(df['Young women age 18-29 years who experienced sexual violence by age 18 (%)'])
plot = go.Figure(data=[go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Rural']["Young women age 18-29 years who experienced sexual violence by age 18 (%)"],
stackgroup='one', name="Rural"),
go.Scatter(
x = df[df["Area"]=='Rural']["States/UTs"],
y = df[df["Area"]=='Urban']["Young women age 18-29 years who experienced sexual violence by age 18 (%)"],
stackgroup='one', name="Urban"),
])
plot.show()
Here also we can see the similar trends as in spousal violence. Just that here instead of Ladakh, Women in Odisha suffer least sexual violence in both Rural and Urban region.
Karnataka, Bihar, Manipur and West Bengal has very high percentage of women suffering from sexual, physical and spousal violence as compared to other states. So governments of these states should implement the laws in strict manner so that women will feel safe.
We will include analysis of more sectors like finance, basic necessities, etc. related to women to reach to the conclusion about their overall status.